Skip to content

Add explicit FieldValue canonicalization #1178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 1, 2020

Conversation

schmidt-sebastian
Copy link
Contributor

Adds a new format for canonical IDs of field values that is meant to be verifiable and stable.

@codecov
Copy link

codecov bot commented Jan 30, 2020

Codecov Report

❗ No coverage uploaded for pull request base (mrschmidt/rewritefieldvalue@9354294). Click here to learn what that means.
The diff coverage is 96.15%.

Flag Coverage Δ Complexity Δ
#FirebaseFirestore 62.59% <96.15%> (?) 2375 <20> (?)
#FirebaseFirestore_Ktx 41.17% <ø> (?) 0 <ø> (?)
Impacted Files Coverage Δ Complexity Δ
...le/firebase/firestore/model/value/ProtoValues.java 92.26% <96.15%> (ø) 82 <20> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9354294...51d8684. Read the comment docs.

}
}

private static void buildTimestampCanonicalId(StringBuilder builder, Timestamp timestamp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildTimestampCanonicalId reads weirdly and is the third form for names in this file:

  • numberEquals
  • compareNumber
  • buildNumberCanonicalId

It seems like these could all have the same shape.

Ideas:

  • make these verbless, i.e. have

    private static void canonicalId(StringBuilder builder, Value value) ...

    and these helpers can then just drop the "build" prefix.

  • use a verb like "canonify" and then these become e.g. "canonifyTimestamp".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't bring myself to use canonify or canonicalize. I used stringify, but am open to feedback, even if that means canonify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"stringify" doesn't really tie into canonical ID though and I think that's probably more important. I don't have any better suggestions though, so feel free to put it back to makeCanonicalId or whatever it was.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rename these methods to use "canonify" instead. Apparently other people are using this terminology too: https://docs.cfengine.com/docs/3.12/reference-functions-canonify.html

}

private static void buildGeoPointCanonicalId(StringBuilder builder, LatLng latLng) {
builder.append(String.format("{lat:%s,lng:%s}", latLng.getLatitude(), latLng.getLongitude()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indistinguishable from a user-supplied object that happens to have these parameters. While JSON-like seems useful we could avoid collisions altogether if we made this look like an object constructor instead.

Something like geo(lat:%s,lng:%s) or even just geo(%s, %s).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep the IDs short. geo(%s, %s) certainly is short enough and solves the ambiguity.

private static void buildObjectCanonicalId(StringBuilder builder, MapValue mapValue) {
builder.append("{");
boolean first = true;
for (Map.Entry<String, Value> entry : mapValue.getFieldsMap().entrySet()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you intending to make it an invariant that MapValues are always sorted by key? I don't recall seeing this in your other PR adding protovalue.ObjectValue.Builder.

If not, we probably need to sort the strings here :-(. I think maintaining the invariant is probably better though.

Note that server-supplied MapValues already conform to this invariant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MapValues are backed by a LinkedHashMap, and keeping this sorted in ObjectValue.Builder would be pretty expensive. Even though it feels like the wrong thing to do here, I don't see a good solution that doesn't involve resorting here. I experimented with sorted views, but in the end we probably don't need to go too crazy as large maps in Query constraints are hopefully uncommon.

@wilhuff wilhuff assigned schmidt-sebastian and unassigned wilhuff Jan 30, 2020
// Even though MapValue are likely sorted correctly based on their insertion order (e.g. when
// received from the backend), local modifications can bring elements out of order. We need to
// re-sort the elements to ensure that canonical IDs are independent of insertion order.
SortedMap<String, Value> sortedMap = new TreeMap<>(mapValue.getFieldsMap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building a binary tree to sort the entries is pretty wasteful in terms of memory allocation. It's also pessimistic because building the tree is always going to cost O(n lg(n)). An alternative would be to collect the keys into an ArrayList and sort that. That has the benefit of allocating less memory and also runs in O(n) if the list is already sorted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through different implementations for optimize for both runtime complexity and readability. While I still think that using large Maps here is kind of an anti-pattern (especially given our lack of truncation support), adding an extra line here to pre-sort the keys (and performing a key-based lookup on the HashMap) might be a worthy tradeoff. Updated.

@wilhuff wilhuff assigned schmidt-sebastian and unassigned wilhuff Jan 31, 2020
Copy link
Contributor

@wilhuff wilhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// received from the backend), local modifications can bring elements out of order. We need to
// re-sort the elements to ensure that canonical IDs are independent of insertion order.
List<String> keys = new ArrayList<>(mapValue.getFieldsMap().keySet());
Collections.sort(keys);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this keys.sort() if you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL: List.sort() is API Level 24.

@wilhuff wilhuff removed their assignment Feb 1, 2020
@schmidt-sebastian schmidt-sebastian merged commit e4b4850 into mrschmidt/rewritefieldvalue Feb 1, 2020
@schmidt-sebastian schmidt-sebastian deleted the mrschmidt/canonical branch February 7, 2020 00:22
schmidt-sebastian added a commit to firebase/firebase-js-sdk that referenced this pull request Feb 27, 2020
@firebase firebase locked and limited conversation to collaborators Mar 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants